31 research outputs found
Evolving a Behavioral Repertoire for a Walking Robot
Numerous algorithms have been proposed to allow legged robots to learn to
walk. However, the vast majority of these algorithms is devised to learn to
walk in a straight line, which is not sufficient to accomplish any real-world
mission. Here we introduce the Transferability-based Behavioral Repertoire
Evolution algorithm (TBR-Evolution), a novel evolutionary algorithm that
simultaneously discovers several hundreds of simple walking controllers, one
for each possible direction. By taking advantage of solutions that are usually
discarded by evolutionary processes, TBR-Evolution is substantially faster than
independently evolving each controller. Our technique relies on two methods:
(1) novelty search with local competition, which searches for both
high-performing and diverse solutions, and (2) the transferability approach,
which com-bines simulations and real tests to evolve controllers for a physical
robot. We evaluate this new technique on a hexapod robot. Results show that
with only a few dozen short experiments performed on the robot, the algorithm
learns a repertoire of con-trollers that allows the robot to reach every point
in its reachable space. Overall, TBR-Evolution opens a new kind of learning
algorithm that simultaneously optimizes all the achievable behaviors of a
robot.Comment: 33 pages; Evolutionary Computation Journal 201
Fast Damage Recovery in Robotics with the T-Resilience Algorithm
Damage recovery is critical for autonomous robots that need to operate for a
long time without assistance. Most current methods are complex and costly
because they require anticipating each potential damage in order to have a
contingency plan ready. As an alternative, we introduce the T-resilience
algorithm, a new algorithm that allows robots to quickly and autonomously
discover compensatory behaviors in unanticipated situations. This algorithm
equips the robot with a self-model and discovers new behaviors by learning to
avoid those that perform differently in the self-model and in reality. Our
algorithm thus does not identify the damaged parts but it implicitly searches
for efficient behaviors that do not use them. We evaluate the T-Resilience
algorithm on a hexapod robot that needs to adapt to leg removal, broken legs
and motor failures; we compare it to stochastic local search, policy gradient
and the self-modeling algorithm proposed by Bongard et al. The behavior of the
robot is assessed on-board thanks to a RGB-D sensor and a SLAM algorithm. Using
only 25 tests on the robot and an overall running time of 20 minutes,
T-Resilience consistently leads to substantially better results than the other
approaches
MAGAN: Margin Adaptation for Generative Adversarial Networks
We propose the Margin Adaptation for Generative Adversarial Networks (MAGANs)
algorithm, a novel training procedure for GANs to improve stability and
performance by using an adaptive hinge loss function. We estimate the
appropriate hinge loss margin with the expected energy of the target
distribution, and derive principled criteria for when to update the margin. We
prove that our method converges to its global optimum under certain
assumptions. Evaluated on the task of unsupervised image generation, the
proposed training procedure is simple yet robust on a diverse set of data, and
achieves qualitative and quantitative improvements compared to the
state-of-the-art
Uncertain Quality-Diversity: Evaluation methodology and new methods for Quality-Diversity in Uncertain Domains
Quality-Diversity optimisation (QD) has proven to yield promising results
across a broad set of applications. However, QD approaches struggle in the
presence of uncertainty in the environment, as it impacts their ability to
quantify the true performance and novelty of solutions. This problem has been
highlighted multiple times independently in previous literature. In this work,
we propose to uniformise the view on this problem through four main
contributions. First, we formalise a common framework for uncertain domains:
the Uncertain QD setting, a special case of QD in which fitness and descriptors
for each solution are no longer fixed values but distribution over possible
values. Second, we propose a new methodology to evaluate Uncertain QD
approaches, relying on a new per-generation sampling budget and a set of
existing and new metrics specifically designed for Uncertain QD. Third, we
propose three new Uncertain QD algorithms: Archive-sampling,
Parallel-Adaptive-sampling and Deep-Grid-sampling. We propose these approaches
taking into account recent advances in the QD community toward the use of
hardware acceleration that enable large numbers of parallel evaluations and
make sampling an affordable approach to uncertainty. Our final and fourth
contribution is to use this new framework and the associated comparison methods
to benchmark existing and novel approaches. We demonstrate once again the
limitation of MAP-Elites in uncertain domains and highlight the performance of
the existing Deep-Grid approach, and of our new algorithms. The goal of this
framework and methods is to become an instrumental benchmark for future works
considering Uncertain QD.Comment: Submitted to Transactions on Evolutionary Computatio
Discovering Unsupervised Behaviours from Full-State Trajectories
Improving open-ended learning capabilities is a promising approach to enable
robots to face the unbounded complexity of the real-world. Among existing
methods, the ability of Quality-Diversity algorithms to generate large
collections of diverse and high-performing skills is instrumental in this
context. However, most of those algorithms rely on a hand-coded behavioural
descriptor to characterise the diversity, hence requiring prior knowledge about
the considered tasks. In this work, we propose an additional analysis of
Autonomous Robots Realising their Abilities; a Quality-Diversity algorithm that
autonomously finds behavioural characterisations. We evaluate this approach on
a simulated robotic environment, where the robot has to autonomously discover
its abilities from its full-state trajectories. All algorithms were applied to
three tasks: navigation, moving forward with a high velocity, and performing
half-rolls. The experimental results show that the algorithm under study
discovers autonomously collections of solutions that are diverse with respect
to all tasks. More specifically, the analysed approach autonomously finds
policies that make the robot move to diverse positions, but also utilise its
legs in diverse ways, and even perform half-rolls.Comment: Published at the Workshop on Agent Learning in Open-Endedness (ALOE)
at ICLR 2022. arXiv admin note: substantial text overlap with
arXiv:2204.0982
Benchmark tasks for Quality-Diversity applied to Uncertain domains
While standard approaches to optimisation focus on producing a single
high-performing solution, Quality-Diversity (QD) algorithms allow large diverse
collections of such solutions to be found. If QD has proven promising across a
large variety of domains, it still struggles when faced with uncertain domains,
where quantification of performance and diversity are non-deterministic.
Previous work in Uncertain Quality-Diversity (UQD) has proposed methods and
metrics designed for such uncertain domains. In this paper, we propose a first
set of benchmark tasks to analyse and estimate the performance of UQD
algorithms. We identify the key uncertainty properties to easily define UQD
benchmark tasks: the uncertainty location, the type of distribution and its
parameters. By varying the nature of those key UQD components, we introduce a
set of 8 easy-to-implement and lightweight tasks, split into 3 main categories.
All our tasks build on the Redundant Arm: a common QD environment that is
lightweight and easily replicable. Each one of these tasks highlights one
specific limitation that arises when considering UQD domains. With this first
benchmark, we hope to facilitate later advances in UQD
Efficient Exploration using Model-Based Quality-Diversity with Gradients
Exploration is a key challenge in Reinforcement Learning, especially in
long-horizon, deceptive and sparse-reward environments. For such applications,
population-based approaches have proven effective. Methods such as
Quality-Diversity deals with this by encouraging novel solutions and producing
a diversity of behaviours. However, these methods are driven by either
undirected sampling (i.e. mutations) or use approximated gradients (i.e.
Evolution Strategies) in the parameter space, which makes them highly
sample-inefficient. In this paper, we propose a model-based Quality-Diversity
approach. It extends existing QD methods to use gradients for efficient
exploitation and leverage perturbations in imagination for efficient
exploration. Our approach optimizes all members of a population simultaneously
to maintain both performance and diversity efficiently by leveraging the
effectiveness of QD algorithms as good data generators to train deep models. We
demonstrate that it maintains the divergent search capabilities of
population-based approaches on tasks with deceptive rewards while significantly
improving their sample efficiency and quality of solutions
Efficient Learning of Locomotion Skills through the Discovery of Diverse Environmental Trajectory Generator Priors
Data-driven learning based methods have recently been particularly successful
at learning robust locomotion controllers for a variety of unstructured
terrains. Prior work has shown that incorporating good locomotion priors in the
form of trajectory generators (TGs) is effective at efficiently learning
complex locomotion skills. However, defining a good, single TG as
tasks/environments become increasingly more complex remains a challenging
problem as it requires extensive tuning and risks reducing the effectiveness of
the prior. In this paper, we present Evolved Environmental Trajectory
Generators (EETG), a method that learns a diverse set of specialised locomotion
priors using Quality-Diversity algorithms while maintaining a single policy
within the Policies Modulating TG (PMTG) architecture. The results demonstrate
that EETG enables a quadruped robot to successfully traverse a wide range of
environments, such as slopes, stairs, rough terrain, and balance beams. Our
experiments show that learning a diverse set of specialized TG priors is
significantly (5 times) more efficient than using a single, fixed prior when
dealing with a wide range of environments
Learning to Walk Autonomously via Reset-Free Quality-Diversity
Quality-Diversity (QD) algorithms can discover large and complex behavioural
repertoires consisting of both diverse and high-performing skills. However, the
generation of behavioural repertoires has mainly been limited to simulation
environments instead of real-world learning. This is because existing QD
algorithms need large numbers of evaluations as well as episodic resets, which
require manual human supervision and interventions. This paper proposes
Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous
learning for robotics in open-ended environments. We build on Dynamics-Aware
Quality-Diversity (DA-QD) and introduce a behaviour selection policy that
leverages the diversity of the imagined repertoire and environmental
information to intelligently select of behaviours that can act as automatic
resets. We demonstrate this through a task of learning to walk within defined
training zones with obstacles. Our experiments show that we can learn full
repertoires of legged locomotion controllers autonomously without manual resets
with high sample efficiency in spite of harsh safety constraints. Finally,
using an ablation of different target objectives, we show that it is important
for RF-QD to have diverse types solutions available for the behaviour selection
policy over solutions optimised with a specific objective. Videos and code
available at https://sites.google.com/view/rf-qd
DR 3.2: First prototype of user model construction algorithms featuring input from sentiment and interaction mining
This document describes the rst prototype of the
PAL user modeling algorithm based on sentiment and interaction developed
during the second year of the project in work package 3.
The overall objective of workpackage 3 is to adapt the behavior of the
PAL system to each of its users. This adaptation is necessary to ensure
1) engagement of the user and 2) increased eectiveness in personal goal
achievement. People tend to adapt their interaction style in a conversation
based on observations such as the perceived sentiment of the conversation
partner. For this purpose it is worthwhile to include sentiment in the in-
teraction process, such that the robot or its avatar can adapt its behavior
accordingly. Sentiment extracted from the child's diaries can assist in esti-
mating the current well-being and emotional state of the child. Moreover,
sentiment can be used as feedback mechanism to ne tune the user model